A Hybrid Oriya Named Entity Recognition system: Harnessing the Power of Rule
نویسنده
چکیده
This paper describes a hybrid system that applies maximum entropy (MaxEnt) model with Hidden Markov model (HMM) and some linguistic rules to recognize name entities in Oriya language. The main advantage of our system is, we are using both HMM and MaxEnt model successively with some manually developed linguistic rules. First we are using MaxEnt to identify name entities in Oria corpus, then tagging them temporary as reference. The tagged corpus of MaxEnt now regarded as a training process for HMM. Now we use HMM for final tagging. Our approach can achieve higher precision and recall, when providing enough training data and appropriate error correction mechanism.
منابع مشابه
A Novel Approach to Conditional Random Field-based Named Entity Recognition using Persian Specific Features
Named Entity Recognition is an information extraction technique that identifies name entities in a text. Three popular methods have been conventionally used namely: rule-based, machine-learning-based and hybrid of them to extract named entities from a text. Machine-learning-based methods have good performance in the Persian language if they are trained with good features. To get good performanc...
متن کاملA Two Stage Language Independent Named Entity Recognition for Indian Languages
This paper describes about the development of a two stage hybrid Named Entity Recognition (NER) system for Indian Languages particularly for Hindi, Oriya, Bengali and Telugu. We have used both statistical Maximum Entropy Model (MaxEnt) and Hidden Markov Model (HMM) in this system. We have used variety of features and contextual information for predicting the various Named Entity (NE) classes. T...
متن کاملبهبود شناسایی موجودیتهای نامدار فارسی با استفاده از کسره اضافه
Named entity recognition is a process in which the people’s names, name of places (cities, countries, seas, etc.) and organizations (public and private companies, international institutions, etc.), date, currency and percentages in a text are identified. Named entity recognition plays an important role in many NLP tasks such as semantic role labeling, question answering, summarization, machine ...
متن کاملA Hybrid Approach for Named Entity Recognition in Indian Languages
In this paper we describe a hybrid system that applies maximum entropy model (MaxEnt), language specific rules and gazetteers to the task of named entity recognition (NER) in Indian languages designed for the IJCNLP NERSSEAL shared task. Starting with named entity (NE) annotated corpora and a set of features we first build a baseline NER system. Then some language specific rules are added to th...
متن کاملAggregating Machine Learning and Rule Based Heuristics for Named Entity Recognition
This paper, submitted as an entry for the NERSSEAL-2008 shared task, describes a system build for Named Entity Recognition for South and South East Asian Languages. Our paper combines machine learning techniques with language specific heuristics to model the problem of NER for Indian languages. The system has been tested on five languages: Telugu, Hindi, Bengali, Urdu and Oriya. It uses CRF (Co...
متن کامل